Here we show the code to reproduce the analyses of: Risso and Pagnotta (2020). Per-sample standardization and asymmetric winsorization lead to accurate classification of RNA-seq expression profiles. In preparation.
This file belongs to the repository: https://github.com/drisso/awst_analysis.
The code is released with license GPL v3.0.
awstif (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("drisso/awst")
The collection of the SEQC datasets is available throught the seqc Bioconductor package. It can be installed with the following.
BiocManager::install("seqc")
We next build the data matrix from the “ILM_aceview” experiments. We remove duplicate gene symbols, ERCC spike-ins, and genes with no ENTREZ ID.
| AGR | BGI | CNL | COH | MAY | NVS | Sum | |
|---|---|---|---|---|---|---|---|
| A | 4 | 5 | 5 | 4 | 5 | 4 | 27 |
| B | 4 | 5 | 5 | 4 | 5 | 4 | 27 |
| C | 4 | 5 | 5 | 4 | 5 | 4 | 27 |
| D | 4 | 5 | 5 | 4 | 5 | 4 | 27 |
| Sum | 16 | 20 | 20 | 16 | 20 | 16 | 108 |
## [1] "A3_BGI" "D4_BGI" "D4_NVS" "A3_MAY" "C2_BGI" "A1_COH"
## [1] 15229 108
## [1] 1000 108
## end fraction
## clustered
## clustered
## clustered
## clustered
## Pearson's Chi-squared test
## X-stat = 108 (vCramer = 100%), df = 3, p-value = 2.95608e-23
## Pearson's Chi-squared test
## X-stat = 234.0248 (vCramer = 84.99%), df = 9, p-value = 2.33558e-45
## end fraction
## clustered
## clustered
## clustered
## clustered
## Pearson's Chi-squared test
## X-stat = 108 (vCramer = 100%), df = 3, p-value = 2.95608e-23
## Pearson's Chi-squared test
## X-stat = 324 (vCramer = 100%), df = 9, p-value = 2.09594e-64
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ConsensusClusterPlus_1.50.0 awst_0.0.3
## [3] dendextend_1.12.0 cluster_2.1.0
## [5] knitr_1.26
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.3 highr_0.8 pillar_1.4.2
## [4] compiler_3.6.1 viridis_0.5.1 tools_3.6.1
## [7] digest_0.6.23 evaluate_0.14 lifecycle_0.1.0
## [10] tibble_2.1.3 gtable_0.3.0 viridisLite_0.3.0
## [13] pkgconfig_2.0.3 rlang_0.4.2 parallel_3.6.1
## [16] yaml_2.2.0 xfun_0.11 gridExtra_2.3
## [19] stringr_1.4.0 dplyr_0.8.3 grid_3.6.1
## [22] tidyselect_0.2.5 Biobase_2.46.0 glue_1.3.1
## [25] R6_2.4.1 rmarkdown_1.17 ggplot2_3.2.1
## [28] purrr_0.3.3 magrittr_1.5 BiocGenerics_0.32.0
## [31] scales_1.1.0 htmltools_0.4.0 assertthat_0.2.1
## [34] colorspace_1.4-1 stringi_1.4.3 lazyeval_0.2.2
## [37] munsell_0.5.0 crayon_1.3.4